Estimating latent infections

A retrospective


Ryan J. Tibshirani

Daniel J. McDonald, Rachel Lobay, and CMU’s Delphi Group

CDC Flu – 18 December 2023

Goal

Use reported cases to estimate actual infections

  • For every US state
  • Between March 1, 2020 to January 1, 2022
  • Provide an “authoritative” estimate with uncertainty
  • No compartmental models, no sampling or Bayesian methods

Retrospective deconvolution

  • Based on prior work (Jahja et al. 2021)
  • Take reported cases and deconvolve them to find when symptoms began
  • Private CDC linelist to estimate the delay from symptom onset to case report
    • Different delay distribution for every report date and state
  • Combine with Literature estimate of the delay from infection to symptom onset
    • Variant specific
    • Prevailing variant mix taken from GISAID
  • Convolve both to get delay distribution from infection to case report

Empirical delay distributions

  • Day / state specific, using CDC Private Linelist
  • Method of moments to fit a gamma density

Cases are selectively reported to CDC

  • CDC linelist with both onset and report date
  • Shrink the parameters proportionally toward national (each day)

Variant mix over time - from GISAID

Incubation period

  • Literature estimates for each variant
  • Distribution is by day / state specific variant mix

Convolved distribution – Infection to case report

From incident cases to incident infection onset

  • Call this “deconvolved cases”
  • Move the date back using the convolved distribution
\[\begin{aligned} \mathop{\mathrm{minimize}}_{x}\ \sum_{t = 1}^n \left ( y_t - \sum_{k = 1}^{d} \hat{p}_t(k)x_{t-k} \right )^2 + \lambda \|D^{(4)}x\|_1. \end{aligned}\]
  • \(D^{(4)}\) is a 4th-order difference matrix.
  • Result is a smooth estimate of the deconvolved cases

Deconvolve cases by their delay distribution

From deconvolved cases to circulating infections

  • Use serology to estimate the proportion of infections that are reported
  • “Leaky immunity” model
\[\begin{aligned} s_t = (1-\gamma)s_{t-1} + a_t (C_t - C_{t-1}) z_t + \epsilon_t \end{aligned}\]
  • \(s_t\) is population immunity
  • \(\gamma\) is percentage that loses immunity between \((t-1)\) and \(t\)
  • \(a_t\) is the inverse reporting ratio
  • \(C_t\) is cumulative (deconvolved) cases at time \(t\)
  • \(z_t\) is estimated fraction of cases that are first infections (from Literature)
  • \(\epsilon_t\) is Gaussian noise.

Serology data

  • Two sources, noisy realizations of \(s_t\)
  • Lots of missingness

State space model

\[\begin{aligned} s_t = (1-\gamma)s_{t-1} + a_t (C_t - C_{t-1}) z_t + \epsilon_t \end{aligned}\]
  • Treat \(s_t\) as latent
  • Estimate \(\gamma\) and \(a_t\) and noise variances using a state space model
  • Use Kalman filter / smoother, maximize the likelihood
  • Handles missingness automatically
  • Imposes smoothness on \(a_t\) (like a spline)
  • Also gives variance estimates for \(a_t\)

Estimated seroprevalence and leaky model parameters

  • \(\gamma\) estimated to be 0.8% per week

Estimated inverse reporting ratios

Estimated latent infections

Pre-omicron

Omicron (more uncertain, due to serology)

Callouts

Final slide

Thanks:

  • The whole CMU Delphi Team (across many institutions)
  • Optum/UnitedHealthcare, Change Healthcare.
  • Google, Facebook, Amazon Web Services.
  • Quidel, SafeGraph, Qualtrics.
  • Centers for Disease Control and Prevention.
  • Council of State and Territorial Epidemiologists